Module 3: Data Structures: Vectors and Lists

Jacob Jameson
Summer 2021

1d Data Structures in R

1d Data Structures in R include the below two types:

  • (atomic) vectors
  • lists

(Atomic) vectors and lists are the most common and basic data structures in R and are pretty much the workhorse of R.

The Different Vector Modes

A vector is a collection of elements that are most commonly of mode character,logical, integer or numeric.

You can create an empty vector with vector(). (By default the mode is logical.)

vector() # an empty 'logical' (the default) vector
logical(0)
vector(mode="character", length = 5) # a vector of mode 'character' with 5 elements
[1] "" "" "" "" ""

You can be more explicit: It is more common to use direct constructors such as character(), numeric(), etc.

character(5) # the same thing, but using the constructor directly
[1] "" "" "" "" ""
numeric(5)   # a numeric vector with 5 elements
[1] 0 0 0 0 0
logical(5)   # a logical vector with 5 elements
[1] FALSE FALSE FALSE FALSE FALSE

You can also create vectors by directly specifying their content. R will then guess the appropriate mode of storage for the vector. For instance:

x = c(1, 2, 3)
x
[1] 1 2 3

will create a vector x of mode numeric. These are the most common kind, and are treated as double precision real numbers.

You can create vectors as a sequence of numbers.

1:10
 [1]  1  2  3  4  5  6  7  8  9 10
seq(10)
 [1]  1  2  3  4  5  6  7  8  9 10
seq(from = 1, to = 10, by = 0.5)
 [1]  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5  8.0
[16]  8.5  9.0  9.5 10.0

Using TRUE and FALSE will create a vector of mode logical:

y = c(TRUE, TRUE, FALSE, FALSE)
y
[1]  TRUE  TRUE FALSE FALSE

While using quoted text will create a vector of mode character:

z = c("Andy", "Ben", "Charlie")
z
[1] "Andy"    "Ben"     "Charlie"

Examining Vectors

The functions length(), class() and str() provide useful information about your vectors and R objects in general.

length(z)
[1] 3
class(z)
[1] "character"
str(z)
 chr [1:3] "Andy" "Ben" "Charlie"

Adding Elements

The function c() (for combine) can also be used to add elements to a vector.

z = c(z, "Doug")
z
[1] "Andy"    "Ben"     "Charlie" "Doug"   
z = c("Eric", z)
z
[1] "Eric"    "Andy"    "Ben"     "Charlie" "Doug"   

Accessing Elements by Index

We can access data by the index

z[3]
[1] "Ben"
z[2:4]
[1] "Andy"    "Ben"     "Charlie"
z[c(1,3)]
[1] "Eric" "Ben" 

Accessing Element by Logical Vector

A logical vector contains only the special values TRUE & FALSE. We will talk about vector next.

c(TRUE, TRUE, FALSE, FALSE, TRUE)
[1]  TRUE  TRUE FALSE FALSE  TRUE
z
[1] "Eric"    "Andy"    "Ben"     "Charlie" "Doug"   
z[c(TRUE, TRUE, FALSE, FALSE, TRUE)]
[1] "Eric" "Andy" "Doug"

Logical vectors can be created using relational operators e.g. <, >, ==, !=, %in%.

x = c(1, 2, 3, 11, 12, 13)
x < 10
[1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

Exercise

  1. Select all elements in a vector whose value < 10

  2. Change those values to 0

Select all elements in a vector whose value < 10

x[x < 10]
[1] 1 2 3

Change those values to 0

x[x < 10] = 0
x
[1]  0  0  0 11 12 13

What Happens When You Mix Types Inside a Vector?

R will create a resulting vector with a mode that can most easily accommodate all the elements it contains. This conversion between modes of storage is called “coercion”. When R converts the mode of storage based on its content, it is referred to as “implicit coercion”. For instance, can you guess what the following do (without running them first)?

c(4, "ch")
c(TRUE, 5)
c(FALSE, 100)
c(TRUE, "ch")

Answer

c(4, "ch")
[1] "4"  "ch"
c(TRUE, 5)
[1] 1 5
c(FALSE, 100)
[1]   0 100
c(TRUE, "ch")
[1] "TRUE" "ch"  

character > numeric > logical

You can also control how vectors are coerced explicitly using the as.() functions:

as.numeric(c("1", "2", "3"))
[1] 1 2 3
as.character(1:2)
[1] "1" "2"
as.numeric(c("a"))
[1] NA

List

In R lists act as containers. Unlike atomic vectors, the contents of a list are not restricted to a single mode and can encompass any mixture of data types. Lists are sometimes called generic vectors, because the elements of a list can by of any type of R object, even lists containing further lists. This property makes them fundamentally different from atomic vectors.

A list is a special type of vector. Each element can be a different type.

Create a list by hand

Create lists using list() or coerce other objects using as.list().

x = list(1, "a", TRUE)
x
[[1]]
[1] 1

[[2]]
[1] "a"

[[3]]
[1] TRUE

Access a list

The content of elements of a list can be retrieved by using double square brackets.

x[[1]]
[1] 1
  1. What is the class of x[1]?
  2. What about x[[3]]?

Elements of a list can be named (i.e. lists can have the names attribute)

data

my_pie = list(type="key lime", diameter=7,
              is.vegetarian=TRUE)

data

my_pie
$type
[1] "key lime"

$diameter
[1] 7

$is.vegetarian
[1] TRUE
names(my_pie)
[1] "type"          "diameter"      "is.vegetarian"

A list does not print to the console like a vector. Instead, each element of the list starts on a new line.

Elements are indexed by double brackets[[]]. Single brackets [] will still return a(nother) list. If the elements of a list are named, they can be referenced by the $ notation

my_pie$type
[1] "key lime"
dat = data.frame(id = letters[1:5], x = 1:5, y = 16:20)
dat
  id x  y
1  a 1 16
2  b 2 17
3  c 3 18
4  d 4 19
5  e 5 20

See that a data frame is actually a special list:

is.list(dat)
[1] TRUE
class(dat)
[1] "data.frame"

Data Frame vs Vector vs List

data

Recap

  • Can you list different ways to access data in data frame?
  • What is the difference between a vector and a list?